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Abstract: 

Miminizing errors of the physical parameters of interest should be the ultimate goal of 
any event selection optimization in high energy physics data analysis involving parameter 
determination. Quick and reliable error estimation is a crucial ingredient for realizing 
this goal. In this paper we derive a formalism for direct evaluation of measurement errors 
using the signal probability density function and large fully simulated signal and back- 
ground samples without need for data fitting and background modelling. We illustrate 
the elegance of the formalism in the case of event selection optimization for CP violation 
measurement in B decays. The implication of this formalism on choosing event variables 
for data analysis is discussed. 
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1 Introduction 



While performing event selection optimization has become a common practice in high 
energy physics data analysis, a not so well known fact is that the optimization problem 
is often poorly formulated. Many efforts have been made in developing complicated 
techniques [H [2] to separate signal from background in the multi-dimensional space of 
event variables, but much less attention is paid to the more fundamental questions - what 
is the goal of event selection optimization and how can we quantify it? 

In principle, event selection optimization in analysis measuring particle properties 
should be aimed for the true purpose of the analysis, i.e. to produce the most accurate 
and precise physics results j3j. In practice we have to refine this goal to make it quantifi- 
able. First of all, it is hardly possible to quantify change of systematic uncertainties with 
regard to change of selection criteria in the optimization stage. Concerning event selec- 
tion, the requirement to control systematic uncertainties is supposed to be qualitatively 
fulfilled by imposing constraints on the allowed selection criteria and choices of fitting 
variables. The selection optimization problem effectively becomes the problem of finding 
the selection criteria that minimize statistical uncertainties on the parameters of interest 
under necessary constraints. Unless otherwise stated, we will use the word error to refer 
to statistical uncertainty in this paper. Secondly, optimization is usually implemented by 
maximizing or minimizing a single objective function. In case several physical parameters 
are measured in one analysis, there are different strategies to define the objective function, 
for example by focusing on the most important parameter, by using the weighted linear 
sum of different objective functions, or by using the determinant of the covariance matrix 
of all the parameters of interest. A fully determined covariance matrix is sufficient for 
evaluation of all measurement errors and any form of objective functions. 

In a usual analysis, a mathematical model describing both signal and background 
events needs to be identified first. Then statistical techniques (see e.g. [I] and [5]) such 
as the maximum likelihood method are used to fit the model to some data and find the 
parameter estimates. Finally the parameter covariance matrix is evaluated using the 
parameter estimates, for example by evaluating the Hessian matrix of the log-likelihood 
function at the the maximum likelihood solution point. While in principle guaranteed 
to work and "straightforward" to implement, this procedure is very inefficient or even 
unfeasible for optimization tasks which require to iterate the procedure many times to find 
the optimal selection criteria. It is difficult to dynamically identify a model that accurately 
describes the criteria-dependent background. Even if such a model can be found, the time 
needed to do the large number of data fitting is usually beyond affordability. A clever 
way to perform optimization should be sought. The crucial ingredient of event selection 
optimization is a quick and reliable estimation of the parameter errors ( or the covariance 
matrix) . 
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2 Error estimation in simulation-based optimization 



Most often event selection optimization is pursued using large samples of fully simulated 
signal and background events. Therefore the true values of the physical parameters used to 
generate the simulation data are already known. We can skip the fitting step and directly 
evaluate what the errors of these parameters would be if they were estimated by fitting 
a model to the data. The errors obtained this way are minimized in the optimization 
process. 



2.1 The general formalism 

Suppose we have a selected data sample of both signal and background events each con- 
sisting of a set of k measured quantities X = (xi, ...,Xk). Let P S {X; 0) and Pb(X) be the 
probability density function (p.d.f.) describing the signal and background events respec- 
tively, where = 6m) is a set of parameters whose true values are unknown in real 
data analysis but accessible in simulation-based optimization. includes the physical 
parameters we are interested in and maybe also some signal-related experimental param- 
eters that need to be determined from the fit. Here we have assumed that the background 
p.d.f. is totally fixed using sidebands, control channels or full simulation and thus has no 
fit parameters. 

The p.d.f. describing the total sample can be written as 

P t (X; 0, F) = F ■ P S (X; 0) + (1 - F) ■ P b (X) (1) 

where F is the fraction of signal events in the sample, called the global purity. This kind 
of parameter estimation problem is usually solved by maximizing the likelihood function 

N s +N b 

L(0,F)= J] P t {X V ,Q,F) (2) 
i=i 

where N s (N b ) is the number of signal (background) events in the selected sample. 

In most high energy physics data analyses F and are determined using different 
observables, resulting in weak and negligible correlations between them. Using this fact, 
we can approximate the inverse V~ x of the covariance matrix = cov[§i,9j] using the 
Hessian matrix of the log-likelihood function with respect to only: 

(v -i, „ J 2 \*L N ^ b dHnP t (Xr,e,F) 

[V to-^-- dew de.fi ) ■ [6) 

The Hessian matrix should be evaluated at the true values of the parameters in principle, 
but can only be evaluated at the maximum likelihood solution point in real data analysis. 
Since the true values of all parameters can be known in simulation-based optimization, 
we have no need to perform data fitting for evaluation of the Hessian matrix. 
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Using Eq. (pQ), the Hessian matrix becomes 



N s +N b r 



•J 



H, 



i.i 



E 

1=1 



iff - fi) 



dlnP s (X i; e) dlnP s (X i; e) 



de 3 



d 2 lnP s (X i; e) 
dOidOj 



(4) 
where 

fi = F ■ P s (X r , Q)/(F ■ P s (Xu 0) + (1 - F) ■ P b (X t )) (5) 

is the local purity in the vicinity of the l-th event, which can be estimated numerically, 
for example by counting the number of signal and background events in a zone containing 
this event. 

In the large sample limit, we have 



N a +N b 

E A 

1=1 



N s 



G(X t ) = G(X t 



(6) 



i=i 



for any function G(X). Assuming this relation holds for moderate sample size, we can 
rewrite Eq. (0J as 
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(7) 



which is the master formula for evaluation of the Hessian matrix. An important feature 
of this result is that any element of the Hessian matrix can be expressed as sum of 
contributions from each signal event that can be evaluated using the signal p.d.f. and 
the local purity factor. There is no need to model the background as all information 
about background is contained in the local purity factor dynamically evaluated from fully 
simulated data. Once the Hessian matrix is estimated, it is trivial to invert it to get the 
covariance matrix V for the parameters 0. Errors of physical parameters and eventually 
the optimization objective function can be easily evaluated. 



2.2 A simplified situation 

Very often we care about the error of only one physical parameter, denoted as 8i, in 
an analysis and this parameter has weak and insignificant correlations with other fit 
parameters. In this case an objective function to be maximized can be simply defined as 
inverse of the variance of the parameter Qy- 



Q 



var{6\) 



H 



ii 
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i=i 



(fi - 1; 



«9mP s (X,;0)V d 2 \nP s (X i; Q) 



If the signal p.d.f. is a linear function of 9\, this can be further simplified into 



Q 



var(8i 



i=i 



d\nP s {XuQ) 

ae 1 



(9) 



3 



Only under the condition that all signal events have equal contribution to the mea- 
surement of 6\, which means both local purity and dlnP s (Xi; Q)/d9\ are constant, the 
objective function can be reduced to the often misused form 

QocF - N ^wm- (10) 

In general cases, maximizing N^/(N S + ]V&) will lead to suboptimal solution except for 
single-bin counting analysis. 



2.3 Case study: CP violation measurement in B decays 



We now apply the derived formalism to optimization problems in CP violation study of 
neutral B decays, which usually requires measuring the mixing induced CP asymmetry S 
in time-dependent analysis. The signal p.d.f. is 



P s (t, q; S) = e(t) ■ e~ r4 • (1 + q ■ (1 - 2 • u) ■ S • sin(Am • t)) /I 



where the observables (t, q) denote proper time and B flavour tag, / is a normalization 
factor independent of the parameter S, e(t) is an acceptance function describing the 
reconstruction efficiency as a function of proper time, V and Am are parameters whose 
values are already measured in experiments, u denotes the wrong tagging probability with 
a value in the range [0, 0.5]. 

Since S is the only signal parameters in this case and the signal p.d.f. is a linear 
function of S, we can define the objective function following Eq. 



N s 



Q 



var(S) 



ss 



E/r(l-2-^) 



i=i 



sin(Am • £/) 



1 + q t ■ (1 - 2 • loi) ■ S ■ sin(Am ■ £,) 



(12) 



It is worth noting that the shape of the acceptance function is irrelevant to evaluation of 
Q and thus we can ignore it in the signal p.d.f. during the optimization process. 

So far we have not taken into account the effect of proper time resolution. Assuming 
a single Gaussian resolution model, the signal p.d.f. becomes P s (t', q; S) £g> G(t — t'\ at, 0), 
where G is a Gaussian function with mean and standard deviation at. This effectively 
introduces an attenuation factor into the sine term in Eq. (iTTj) and changes Q into: 



i=i 



sin (Am • t{) 



1 + qi ■ (1 - 2 • a;,) • 3 ■ e'^^-') /2 • sin(Am • t t ) 



(13) 

Qualitatively speaking, events with larger local purity, smaller wrong tagging probability 
and better proper time resolution have bigger contribution to the sensitivity on 5". This 
is consistent with intuitive expectation. 

We stress again that only if all factors in the per event contribution to Q are constant 
among signal events, the objective function Q can be reduced to N% /(N s +Nb). Sometimes 
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people try to maximize 

N 2 9 

Q = -^-.(l-2^) 2 -e-( Am -) (14) 

where u and at are averaged over all accepted signal events. This is closer to the objective 
function we propose, but can still be misleading if any of the factors in the per event con- 
tribution to Q has big variation among signal events. At hadron collider experiments, the 
proper time distributions of signal B decays and background B decays are very different, 
with most background concentrating on t ~ and signals having much longer lifetime. 
The local purity is close to zero at t ~ and increases quickly with proper time. If one 
uses certain cuts to suppress t ~ background with the price of losing some signal events 
with large t, this may help maximize the oversimplified objective function in Eq. (fl4l 
but decreases the right one in Eq. (TT5|) . thus leading to loss of sensitivity. This example 
demonstrates that a correct form of objective function should always be used in event 
selection optimization. 



3 Implication on choosing event variables 

Eq. (j7]) and Eq. (Q clearly tell us that every signal event contributes to measurement 
sensitivity. Then a natural question to ask is: why do we perform selection at all? 

Indeed the best statistical precision is achieved when all possible information, including 
every signal event and and every measured variable of each event, is used in data fitting. 
However, it is impossible to do this in practice since the required signal and background 
model will be too complicated to manage and the resulting losses in systematic accuracy 
will far exceed the gains in statistical precision, even if we do not consider the requirement 
to reduce event rates imposed by resource constraints. Therefore it is necessary to make 
some compromises between systematic accuracy and statistical precision. 

According to how they are used in data analysis, event variables can be classified into 
three exclusive categories: fitting variables, binning variables and optimizeable variables. 

Fitting variables are variables that are described in the probability density function 
and used in data fitting. All variables needed for extraction of the physical parameters 
should be used as fitting variables. Variables that have power to separate signal from 
background and can be precisely modelled should also be used as fitting variables. The 
requirement for fitting variables to be modellable guarantees that no big biases on physical 
measurements will be induced due to inconsistency between fit model and data. 

Binning variables are variables that are used to divide the data samples. Variables 
that have power to separate signal from background but for which an accurate model is 
difficult to find can be used as binning variables. By dividing the data sample into bins in 
a mult i- dimensional space and treating each bin as an independent subsample, no events 
are dropped but local purity of signal events can still be enhanced. In practice only a 
small number of binning variables are allowed in an analysis, otherwise the number of 
subsamples will be too big and the complexity of the analysis will be unmanageable. For 
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this reason only variables that will suffer big variation of local purity if they are used as 
optimizieable variables should be used as binning variables. 

Optimizeable variables are variables that are neither fitting variables nor binning vari- 
ables. They can be used for background rejection. Cutting on an optimizeable variable 
increases the power of some signal events with the price of losing some low power signal 
events. Here the power of a signal event refers to its contribution to the Hessian matrix 
elements. If the gain is greater than the loss, the measurement precision is improved. The 
goal of optimization is to find the selection criteria on these variables that minimize the 
measurement errors. That is why these variables are called optimizeable variables. Any 
optimizeable variable can in principle be used for event selection. In practice it is diffi- 
cult and unnecessary to use all of them. A set of optimizeable variables with the highest 
signal/background separation power should be carefully chosen, for which the selection 
criteria are subject to optimization. 

A selection optimization problem is well defined only if all the three types of variables 
are meaningfully identified. Before proceeding to define and maximize/minimize an ob- 
jective function, make sure that: the maximum information is used by including necessary 
variables as fitting and binning variables; and the most powerful optimizeable variables 
are used to separate signal from background. 

4 Conclusions 

This paper presents a new approach to optimizing event selection for high energy physics 
measurements. Rather than using sophisticated techniques to optimize a poorly motivated 
goal, this approach aims to directly minimize the statistical uncertainty in the physical 
measurements. A general formalism for quick and reliable error estimation is derived, and 
illustrated with the example of event selection optimization for CP violation measurement 
in B decays. The formalism not only makes direct error optimization possible, but also has 
immediate implication on choosing appropriate event variables for background rejection, 
data binning and data fitting. In conclusion, event selection optimization for high energy 
physics measurements can be and should be aimed at minimizing the errors on the physical 
results. 
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